4 research outputs found

    Fast Information Retrieval in the Open Grid Service Architecture

    Get PDF
    Information retrieval offers resource discovery mechanisms for unstructured information and has thus been identified as a standardization goal by the open grid forum. We argue that an integration of information retrieval into the infrastructure is not only an interesting prospect for grid users, but is in fact necessary because the batch processing approach supported by the open grid service architecture is at odds with the requirements of online query processing. The cost of staging the search indices to an allocated compute node to answer sporadic but frequent search queries is prohibitive. We advocate the use of web services as a cross site messaging mechanism and discuss the alternatives. To investigate, we have designed and built a prototype system for grid image retrieval. Unfortunately, the statelessness and isolation of web services proved problematic for our purposes, but we present a software architecture that can efficiently overcome these issues

    Parallel Retrieval of Dense Vectors in the Vector Space Model

    Get PDF
    Modern information retrieval systems use distributed and parallel algorithms to meet their operational requirements, and commonly operate on sparse vectors; but dimensionality-reducing techniques produce dense and relatively short feature vectors. Motivated by this relevance of dense vectors, we have parallelized the vector space model for dense matrices and vectors. Our algorithm uses a hybrid partitioning splitting documents and features and operates on a mesh of hosts holding a block partitioned corpus matrix. We show that the theoretic speed-up is optimal. The empirical evaluation of an MPI-based implementation reveals that we obtain a super-linear speed-up on a cluster using Nehalem Xeon CPUs
    corecore